For example,Бобцов

ViSL model: The model automatically generates sentences of Vietnamese sign language

Annotation

The main problem in building intelligent systems is the lack of data for machine learning, which is especially important for sign language recognition for the deaf and hard of hearing. One of the ways to increase the amount of data for training is synthesis. Unlike speech synthesis, it is impossible to create a sequence of gestures in Vietnamese and some other languages that exactly repeat the text. This is due to the significant limitations of the gesture dictionary and the different word order in sentences. The aim of the work is to enrich the educational corpus of video data for use in creating recognition systems for the Vietnamese Sign Language (ViSL). Since it is impossible to translate the words of the source text into gestures one to one, the problem of translating from a regular language into a sign language arises. The paper proposes to use a two-phase process for this. The first phase involves pre-processing the text with standardization of the text format, segmentation of words and sentences, and then encoding the words using the sign language dictionary. At this stage, it should be noted that there is no need to remove punctuation marks and stop words, since they are related to the accuracy of the N-gram model. Next, instead of using syntactic analysis, a statistical method for forming a sequence of gestures is used, and the Markov model on the transition graph between words is taken as a basis in which the probability of the next word depends only on the two previous words. Transition probabilities are calculated on the existing marked corpus of the ViSL. The Breadth-first Search method is used to compile a list of all sentences generated based on a given grammatical rule and a matrix of semantic interactions between words. The inverse of the logarithm of the product of the probabilities of co-occurrence of consecutive 3-word phrases in a sentence is used to estimate the frequency of occurrence of that sentence in a given data set. Based on the ViSL data of 3,234 words, we calculated probability matrices representing the relationships between words based on Vietnamese natural language data with 50 million sentences collected from Vietnamese newspapers and magazines. For different grammar rules, we compare the number of generated sentences and evaluate the accuracy of the 50 most frequent sentences. The average accuracy is 88 %. The accuracy of the generated sentences is estimated by manual statistical methods. The number of generated sentences depends on the number of word parts that are labeled according to the grammar rules. The semantic accuracy of the generated sentences will be very high if the search words are labeled with the correct part-of-speech tagging. Compared with machine learning methods, our proposed method gives very good results for languages without inflections and word order that follow certain rules, such as Vietnamese, and does not require large computational resources. The disadvantage of this method is that its accuracy largely depends on the type of word, sentence, and word segmentation. The relationship of words depends on the observed dataset. Future research direction is to generate paragraphs in sign language. The obtained data can be used in machine learning models for sign language processing tasks.

Keywords

Articles in current issue